Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
LDA¿Í WMD ±â¹ÝÀÇ °ø°£ º¯È¯À» ÀÌ¿ëÇÑ È¿°úÀûÀÎ ¹®¼ Ŭ·¯½ºÅ͸µ ¹æ¹ý |
¿µ¹®Á¦¸ñ(English Title) |
An Efficient Document Clustering Method using Space Transformation based on LDA and WMD |
ÀúÀÚ(Author) |
±è¿ë´ã
Á¤¼º¿ø
Yongdam Kim
Sungwon Jung
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 48 NO. 09 PP. 1052 ~ 1060 (2021. 09) |
Çѱ۳»¿ë (Korean Abstract) |
±âÁ¸ÀÇ TF-IDF ±â¹ÝÀÇ ¹®¼ Ŭ·¯½ºÅ͸µ ±â¹ýÀº ¹®¼ÀÇ ¹®¸Æ Á¤º¸ÀÎ co-occurrence¿Í wordorder¿¡ ´ëÇÑ Á¤º¸¸¦ ÃæºÐÈ÷ È°¿ëÇÏÁö ¸øÇÏ°í, |
¿µ¹®³»¿ë (English Abstract) |
The existing TF-IDF-based document clustering methods do not properly exploit the contextual information of documents, i.e., co-occurence and word-order, and tend to degrade the performance due to the curse of dimensionality. To overcome these problems, the techniques such as a weighted average of word embedding vectors or Word Mover's Distance (WMD) have been proposed. The performance of the techniques is good at document classification, but not a document clustering that needs to group documents. In this study, we define a document group as a topic document using LDA, the document group's representative document, and solve the existing problem by calculating the WMD based on the topic document. However, since WMD requires a large amount of computation, we propose a space transformation method that shows a good performance while reducing the computation cost by mapping each document to a low-dimensional space in which each axis means WMD value from each topic document. |
Å°¿öµå(Keyword) |
¹®¼ Ŭ·¯½ºÅ͸µ
word mover's distance
°ø°£ º¯È¯ ±â¹ý
¿öµå ÀÓº£µù
document clustering
word mover's distance
space
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|